Skip to content

[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close#19389

Merged
yiguolei merged 3 commits into
apache:masterfrom
gitccl:enhance_scanner
May 23, 2023
Merged

[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close#19389
yiguolei merged 3 commits into
apache:masterfrom
gitccl:enhance_scanner

Conversation

@gitccl

@gitccl gitccl commented May 8, 2023

Copy link
Copy Markdown
Contributor

Proposed changes

Issue Number: close #19283

Problem summary

Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.

ssb flat test

previous

  • lineorder 1.2G:
    • load time: 3s, query time: 0.355s
  • lineorder 5.8G:
    • load time: 330s, query time: 0.970s
    • load time: 349s, query time: 0.949s
    • load time: 349s, query time: 0.955s
    • load time: 360s, query time: 0.889s (pipeline enabled)

after

  • lineorder 1.2G:
    • load time: 3s, query time: 0.349s
  • lineorder 5.8G:
    • load time: 342s, query time: 0.929s
    • load time: 337s, query time: 0.913s
    • load time: 345s, query time: 0.946s
    • load time: 346s, query time: 0.865s (pipeline enabled)

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@gitccl gitccl marked this pull request as ready for review May 8, 2023 08:39
@gitccl

gitccl commented May 8, 2023

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions

github-actions Bot commented May 8, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions

github-actions Bot commented May 8, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen

hello-stephen commented May 8, 2023

Copy link
Copy Markdown
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.57 seconds
stream load tsv: 422 seconds loaded 74807831229 Bytes, about 169 MB/s
stream load json: 22 seconds loaded 2358488459 Bytes, about 102 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230511074731_clickbench_pr_142206.html

@gitccl gitccl force-pushed the enhance_scanner branch from 077b4c7 to ba170d4 Compare May 9, 2023 08:00
@gitccl

gitccl commented May 9, 2023

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions

github-actions Bot commented May 9, 2023

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl

gitccl commented May 9, 2023

Copy link
Copy Markdown
Contributor Author

run p0

@gitccl gitccl force-pushed the enhance_scanner branch from ba170d4 to e061256 Compare May 10, 2023 01:23
@gitccl

gitccl commented May 10, 2023

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl

gitccl commented May 10, 2023

Copy link
Copy Markdown
Contributor Author

run p0

1 similar comment
@gitccl

gitccl commented May 10, 2023

Copy link
Copy Markdown
Contributor Author

run p0

@yiguolei

Copy link
Copy Markdown
Contributor

This may have a large performance decrease. In this case, it means the block is allocated by scanner thread and used by fragment thread or released by fragment thread. In jemalloc, it will track the arena the memory is allocated from and it has to return the memory to the arena again when release. Every thread is bond to an arena. There will be lock or condition competition。

@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl

gitccl commented May 11, 2023

Copy link
Copy Markdown
Contributor Author

run buildall

@gitccl

gitccl commented May 12, 2023

Copy link
Copy Markdown
Contributor Author

This may have a large performance decrease. In this case, it means the block is allocated by scanner thread and used by fragment thread or released by fragment thread. In jemalloc, it will track the arena the memory is allocated from and it has to return the memory to the arena again when release. Every thread is bond to an arena. There will be lock or condition competition。

It seems that there is no performance loss in ssb flat test. I pasted the test result above.

@gitccl gitccl force-pushed the enhance_scanner branch from 6958505 to 48fa685 Compare May 23, 2023 07:58
@github-actions

Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@gitccl

gitccl commented May 23, 2023

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.11 seconds
stream load tsv: 442 seconds loaded 74807831229 Bytes, about 161 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 80.0 seconds inserted 10000000 Rows, about 125K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230523084143_clickbench_pr_148829.html

Comment thread be/src/vec/exec/scan/scanner_context.cpp

@yiguolei yiguolei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 6efe6ef into apache:master May 23, 2023
@gitccl gitccl deleted the enhance_scanner branch May 24, 2023 02:15
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request May 31, 2023
yiguolei pushed a commit that referenced this pull request Aug 19, 2023
yiguolei pushed a commit that referenced this pull request Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] allocate blocks in scanner_context on demand and free them timely

3 participants